feat(runners): add mi325x-vultr launch script#1738
Conversation
Add runners/launch_mi325x-vultr.sh for the vultr mi325x fleet. Modeled on launch_mi325x-amds.sh (same SKU, same compute partition, same single-node salloc/import/srun flow and *_mi325x.sh bench invocation), with the two cluster-specific paths: - enroot cache (import layer cache + imported .sqsh) at /enroot/sa - pre-staged model weights / HF hub cache at /nfsdata/sa/models/, bind-mounted over the container HF_HUB_CACHE so `hf download "$MODEL"` reuses the staged models--org--name caches instead of re-downloading from HF. Both paths are node-local ext4 at the same path on every compute node; import and run share one Slurm job on a single node, so node-local storage suffices. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27453624071 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27453730455 |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 441ba6d. Configure here.
| image: vllm/vllm-openai-rocm:minimax-m3 | ||
| model: MiniMaxAI/MiniMax-M3-MXFP8 | ||
| model-prefix: minimaxm3 | ||
| runner: mi325x |
There was a problem hiding this comment.
Wrong runner type in config
High Severity
The new Vultr MiniMax-M3 entry sets runner to mi325x, so CI schedules the AMDS fleet and launch_mi325x-amds.sh instead of mi325x-vultr and launch_mi325x-vultr.sh. Staged weights at /nfsdata/sa/models/ and enroot cache at /enroot/sa are never used for this config.
Reviewed by Cursor Bugbot for commit 441ba6d. Configure here.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27453730455 |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27454108525 |
1 similar comment
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27454108525 |
Node chi-mi325x-pod1-027 fails SLURM resume/boot — salloc grants an allocation then relinquishes it with "Something is wrong with the boot of the nodes" (run 27454108525), gating the minimaxm3-fp8-mi325x canary and thus the whole sweep. Add it to the --exclude list alongside the existing pod1-121 exclusion until the node is repaired. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=27455128357 |


Add runners/launch_mi325x-vultr.sh for the vultr mi325x fleet. Modeled on launch_mi325x-amds.sh (same SKU, same compute partition, same single-node salloc/import/srun flow and *_mi325x.sh bench invocation), with the two cluster-specific paths:
hf download "$MODEL"reuses the staged models--org--name caches instead of re-downloading from HF.Both paths are node-local ext4 at the same path on every compute node; import and run share one Slurm job on a single node, so node-local storage suffices.
Note
Low Risk
Changes are additive benchmark/CI infrastructure (configs, launcher, shell recipe) with no production auth or data-path logic; main risk is long, resource-heavy CI sweeps on new hardware.
Overview
Adds day-zero MiniMax-M3 MXFP8 single-node vLLM benchmarking on the Vultr MI325X fleet, alongside infrastructure to run it in CI.
A new
mi325x-vultrrunner pool (six GitHub runners) is wired tolaunch_mi325x-vultr.sh, which follows the existing MI325X Slurm/enroot flow but uses Vultr-specific enroot cache (/enroot/sa), staged HF hub cache bind-mount (/nfsdata/sa/models/), and Slurm node excludes for known-bad hosts.minimaxm3-fp8-mi325x-vllminamd-master.yamlregistersMiniMaxAI/MiniMax-M3-MXFP8onvllm/vllm-openai-rocm:minimax-m3with fixed-seq-len sweeps (1k1k / 8k1k) over TP4/TP8, TEP (EP4/EP8), and DEP—TP2 is omitted vs B300 because ~444 GB MXFP8 would OOM on 256 GB GPUs.minimaxm3_fp8_mi325x.shimplements the ROCm recipe: mandatory--block-size 128,TRITON_ATTN,--language-model-only, conc-scaled CUDA graphs, extended engine ready timeout, and standard MI325X ROCm env (AITER, HIP/Ray).perf-changelog.yamldocuments the new config key.Reviewed by Cursor Bugbot for commit 0bd8981. Bugbot is set up for automated code reviews on this repo. Configure here.